Skip to content

fix: resolve AgentRuntime by Deployment name and inject trace env vars into agent containers#245

Open
varshaprasad96 wants to merge 1 commit intokagenti:feat/webhook-migrationfrom
varshaprasad96:fix/agentruntime-trace-env-propagation
Open

fix: resolve AgentRuntime by Deployment name and inject trace env vars into agent containers#245
varshaprasad96 wants to merge 1 commit intokagenti:feat/webhook-migrationfrom
varshaprasad96:fix/agentruntime-trace-env-propagation

Conversation

@varshaprasad96
Copy link
Copy Markdown
Contributor

@varshaprasad96 varshaprasad96 commented Mar 26, 2026

Summary

Two fixes for AuthBridge webhook injection:

  • ReplicaSet name resolution: ReadAgentRuntimeOverrides now strips the pod-template-hash suffix from the ReplicaSet name to match the Deployment name in targetRef. Without this, the webhook couldn't find the AgentRuntime CR because it received myapp-7d4f8b9c5 but targetRef.name is myapp.

  • Trace env var injection into agent containers: Injects OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, and OTEL_TRACES_SAMPLER_ARG from AgentRuntime spec.trace into the user's agent containers (not sidecars). This enables agent-level telemetry (e.g., MLflow, LangChain, OpenTelemetry SDK). Developer-set env vars are never overwritten.

Design decisions

  • User container, not sidecar: The trace config in AgentRuntime is the developer's declaration about their workload's telemetry, not platform-level sidecar config. Sidecar containers are explicitly excluded via sidecarContainerNames map.
  • No-overwrite semantics: If the developer already sets OTEL_EXPORTER_OTLP_ENDPOINT on their container, the webhook respects their value.
  • No reconciliation loops: The webhook mutates the Pod at admission time, not the Deployment. The Deployment's PodTemplateSpec stays untouched.

Changes

File Description
injector/agentruntime_config.go Strip pod-template-hash suffix for Deployment name matching
injector/agentruntime_config_test.go Tests for ReplicaSet and multi-hyphen name matching
injector/pod_mutator.go injectTraceEnvVars() into user containers; hoist resolved variable
injector/pod_mutator_test.go Tests for trace injection, no-overwrite, sidecar exclusion
injector/container_builder.go Formatting cleanup only

Test plan

  • 7 new unit tests pass (ReplicaSet matching, multi-hyphen names, trace injection, no-overwrite, no-override)
  • All 85+ existing tests pass
  • E2E verified on kind cluster: OTEL env vars appear on agent container, not on envoy-proxy

🤖 Generated with Claude Code

@varshaprasad96 varshaprasad96 marked this pull request as ready for review March 26, 2026 18:26
@varshaprasad96 varshaprasad96 requested a review from a team as a code owner March 26, 2026 18:26
@varshaprasad96 varshaprasad96 force-pushed the fix/agentruntime-trace-env-propagation branch from 5a3408e to b53162c Compare March 26, 2026 21:45
Two fixes for the AuthBridge webhook injection:

1. ReadAgentRuntimeOverrides now strips the pod-template-hash suffix from
   the ReplicaSet name to match the Deployment name in targetRef. Without
   this, the webhook couldn't find the AgentRuntime CR because it received
   "myapp-7d4f8b9c5" but targetRef.name is "myapp".

2. Inject OTEL trace env vars (OTEL_EXPORTER_OTLP_ENDPOINT,
   OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_TRACES_SAMPLER_ARG) from
   AgentRuntime spec.trace into the user's agent containers, not the
   sidecar. These configure the agent's own telemetry (e.g., MLflow,
   LangChain). Existing env vars are not overwritten — developer values
   take precedence. Sidecar containers are skipped.

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
@varshaprasad96 varshaprasad96 force-pushed the fix/agentruntime-trace-env-propagation branch from b53162c to 7596b7d Compare March 27, 2026 03:07
@varshaprasad96 varshaprasad96 changed the title fix: resolve AgentRuntime by Deployment name and propagate trace env … fix: resolve AgentRuntime by Deployment name and inject trace env vars into agent containers Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@cwiklik cwiklik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Clean, well-scoped PR fixing two related issues: ReplicaSet-to-Deployment name resolution and OTEL trace env var injection into user containers. The code is correct, no-overwrite semantics are sound, and test coverage is thorough. The exact-match-first approach for name resolution handles edge cases well.

Areas reviewed: Go, Tests
Commits: 1 commit, signed-off: yes, proper Assisted-By trailer
CI status: DCO passing

Excellent PR description with clear design decisions and a completed test plan.

// Derive the Deployment name by stripping the pod-template-hash suffix.
// ReplicaSet names follow the pattern "<deployment>-<pod-template-hash>".
deploymentName := workloadName
if idx := strings.LastIndex(workloadName, "-"); idx > 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The LastIndex stripping is correct for the common case and the exact-match-first approach handles ambiguity well. One edge case worth noting: a Deployment literally named with a hash-like suffix (e.g. my-agent-7d4f8b9c5) would be ambiguous, but exact match takes priority so it still works. Might be worth a brief comment noting this precedence for future readers.

// Inject OTEL trace env vars into the user's (non-sidecar) containers.
// These come from the AgentRuntime spec.trace overrides and configure
// the agent's own telemetry (e.g., MLflow, LangChain, OpenTelemetry SDK).
if resolved != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Trace env vars are only injected when PerWorkloadConfigResolution is enabled (since resolved is nil on the legacy path). This seems intentional — a brief comment noting "trace injection requires the resolved config path" would clarify this for future maintainers.

// injectTraceEnvVars adds OTEL trace env vars to all user (non-sidecar)
// containers. Existing env vars are not overwritten — if the developer already
// set OTEL_EXPORTER_OTLP_ENDPOINT, the webhook respects their value.
func injectTraceEnvVars(podSpec *corev1.PodSpec, resolved *ResolvedConfig) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The log "vars", len(traceEnv) always logs the total count of candidate trace vars, not the number actually injected (some may be skipped due to existing vars). Consider logging the injected count, or rename to "candidateVars".

EnvoyProxyContainerName: true,
ProxyInitContainerName: true,
SpiffeHelperContainerName: true,
ClientRegistrationContainerName: true,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good approach — centralizing sidecar names in a map with a MAINTENANCE comment is clean and maintainable. The no-overwrite semantics are well-implemented.

Type: agentv1alpha1.RuntimeTypeAgent,
TargetRef: agentv1alpha1.TargetRef{
APIVersion: "apps/v1",
Kind: "Deployment",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thorough test coverage for both single-hyphen and multi-hyphen deployment names. Good test structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants